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(57) ABSTRACT 

A frame erasure compensation method in a variable-rate 
speech coder includes quantizing, with a first encoder, a 
pitch lag value for a current frame and a first delta pitch lag 
value equal to the difference between the pitch lag value for 
the current frame and the pitch lag value for the previous 
frame. A second, predictive encoder quantizes only a second 
delta pitch lag value for the previous frame (equal to the 
difference between the pitch lag value for the previous frame 
and the pitch lag value for the frame prior to that frame). If 
the frame prior to the previous frame is processed as a frame 
erasure, the pitch lag value for the previous frame is 
obtained by subtracting the first delta pitch lag value from 
the pitch lag value for the current frame. The pitch lag value 
for the erasure frame is then obtained by subtracting the 
second delta pitch lag value from the pitch lag value for the 
previous frame. Additionally, a waveform interpolation 
method may be used to smooth discontinuities caused by 
changes in the coder pitch memory. 

27 Claims, 11 Drawing Sheets 
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FRAME ERASURE COMPENSATION 
METHOD IN A VARIABLE RATE SPEECH 
CODER 

BACKGROUND OF THE INVENTION 

I. Field of the Invention 

The present invention pertains generally to the field of 
speech processing, and more specifically to methods and 
apparatus for compensating for frame erasures in variable- 
rate speech coders. 

II. Background 

Transmission of voice by digital techniques has become 
widespread, particularly in long distance and digital radio 
telephone applications. This, in turn, has created interest in 
determining the least amount of information that can be sent 
over a channel while maintaining the perceived quality of 
the reconstructed speech. If speech is transmitted by simply 
sampling and digitizing, a data rate on the order of sixty-four 
kilobits per second (kbps) is required to achieve a speech 
quality of conventional analog telephone. However, through 
the use of speech analysis, followed by the appropriate 
coding, transmission, and resynthesis at the receiver, a 
significant reduction in the data rate can be achieved. 

Devices for compressing speech find use in many fields of 
telecommunications. An exemplary field is wireless com- 
munications. The field of wireless communications has 
many applications including, e.g., cordless telephones, 
paging, wireless local loops, wireless telephony such as 
cellular and PCS telephone systems, mobile Internet Proto- 
col (IP) telephony, and satellite communication systems. A 
particularly important application is wireless telephony for 
mobile subscribers. 

Various over-the-air interfaces have been developed for 
wireless communication systems including, e.g., frequency 
division multiple access (FDMA), time division multiple 
access (TDMA), and code division multiple access 
(CDMA). In connection therewith, various domestic and 
international standards have been established including, e.g., 
Advanced Mobile Phone Service (AMPS), Global System 
for Mobile Communications (GSM), and Interim Standard 
95 (IS-95). An exemplary wireless telephony communica- 
tion system is a code division multiple access (CDMA) 
system. The IS-95 standard and its derivatives, IS -95 A, 
ANSI J-STD-008, IS-95B, proposed third generation stan- 
dards IS-95C and IS-2000, etc. (referred to collectively 
herein as IS-95), are promulgated by the Telecommunication 
Industry Association (TIA) and other well known standards 
bodies to specify the use of a CDMA over-the-air interface 
for cellular or PCS telephony communication systems. 
Exemplary wireless communication systems configured 
substantially in accordance with the use of the IS-95 stan- 
dard are described in U.S. Pat. Nos. 5,103,459 and 4,901, 
307, which are assigned to the assignee of the present 
invention and fully incorporated herein by reference. 

Devices that employ techniques to compress speech by 
extracting parameters that relate to a model of human speech 
generation are called speech coders. A speech coder divides 
the incoming speech signal into blocks of time, or analysis 
frames. Speech coders typically comprise an encoder and a 
decoder. The encoder analyzes the incoming speech frame to 
extract certain relevant parameters, and then quantizes the 
parameters into binary representation, i.e., to a set of bits or 
a binary data packet. The data packets are transmitted over 
the communication channel to a receiver and a decoder. The 
decoder processes the data packets, unquantizes them to 
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produce the parameters, and resynthesizes the speech frames 
using the unquantized parameters. 

The function of the speech coder is to compress the 
digitized speech signal into a low-bit-rate signal by remov- 

5 ing all of the natural redundancies inherent in speech. The 
digital compression is achieved by representing the input 
speech frame with a set of parameters and employing 
quantization to represent the parameters with a set of bits. If 
the input speech frame has a number of bits N, and the data 

10 packet produced by the speech coder has a number of bits 
N 0 , the compression factor achieved by the speech coder is 
C r «N,-/N 0 . The challenge is to retain high voice quality of the 
decoded speech while achieving the target compression 
factor. The performance of a speech coder depends on (1) 

15 how well the speech model, or the combination of the 
analysis and synthesis process described above, performs, 
and (2) how well the parameter quantization process is 
performed at the target bit rate of N 0 bits per frame. The goal 
of the speech model is thus to capture the essence of the 

2Q speech signal, or the target voice quality, with a small set of 
parameters for each frame. 

Perhaps most important in the design of a speech coder is 
the search for a good set of parameters (including vectors) 
to describe the speech signal. A good set of parameters 

25 requires a low system bandwidth for the reconstruction of a 
perceptually accurate speech signal. Pitch, signal power, 
spectral envelope (or form ants), amplitude spectra, and 
phase spectra are examples of the speech coding parameters. 
Speech coders may be implemented as time-domain 

30 coders, which attempt to capture the time-domain speech 
waveform by employing high time-resolution processing to 
encode small segments of speech (typically 5 millisecond 
(ms) subframes) at a time. For each subframe, a high- 
precision representative from a codebook space is found by 

35 means of various search algorithms known in the art. 
Alternatively, speech coders may be implemented as 
frequency-domain coders, which attempt to capture the 
short-terra speech spectrum of the input speech frame with 
a set of parameters (analysis) and employ a corresponding 

40 synthesis process to recreate the speech waveform from the 
spectral parameters. The parameter quantizer preserves the 
parameters by representing them with stored representations 
of code vectors in accordance with known quantization 
techniques described in A. Gersho & R. M. Gray, Vector 

45 Quantization and Signal Compression (1992). 

A well-known time -domain speech coder is the Code 
Excited Linear Predictive (CELP) coder described in L. B. 
Rabiner & R. W. Schafer, Digital Processing of Speech 
Signals 396-^53 (1978), which is fully incorporated herein 

50 by reference. In a CELP coder, the short term correlations, 
or redundancies, in the speech signal are removed by a linear 
prediction (LP) analysis, which finds the coefficients of a 
short-term formant filter. Applying the short-term prediction 
filter to the incoming speech frame generates an LP residue 

55 signal, which is further modeled and quantized with long- 
term prediction filter parameters and a subsequent stochastic 
codebook. Thus, CELP coding divides the task of encoding 
the time -domain speech waveform into the separate tasks of 
encoding the LP short-term filter coefficients and encoding 

60 the LP residue. Time-domain coding can be performed at a 
fixed rate (i.e., using the same number of bits, N 0 , for each 
frame) or at a variable rate (in which different bit rates are 
used for different types of frame contents). Variable-rate 
coders attempt to use only the amount of bits needed to 

65 encode the codec parameters to a level adequate to obtain a 
target quality. An exemplary variable rate CELP coder is 
described in U.S. Pat. No. 5,414,796, which is assigned to 
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the assignee of the present invention and fully incorporated 
herein by reference. 

Time-domain coders such as the CELP coder typically 
rely upon a high number of bits, N 0 , per frame to preserve 
the accuracy of the time-domain speech waveform. Such 
coders typically deliver excellent voice quality provided the 
number of bits, N 0 , per frame is relatively large (e.g., 8 kbps 
or above). However, at low bit rates (4 kbps and below), 
time -domain coders fail to retain high quality and robust 
performance due to the limited number of available bits. At 
low bit rates, the limited codebook space clips the 
waveform-matching capability of conventional time-domain 
coders, which are so successfully deployed in higher-rate 
commercial applications. Hence, despite improvements over 
time, many CELP coding systems operating at low bit rates 
suffer from perceptually significant distortion typically char- 
acterized as noise. 

There is presently a surge of research interest and strong 
commercial need to develop a high-quality speech coder 
operating at medium to low bit rates (i.e., in the range of 2.4 
to 4 kbps and below). The application areas include wireless 
telephony, satellite communications, Internet telephony, 
various multimedia and voice-streaming applications, voice 
mail, and other voice storage systems. The driving forces are 
the need for high capacity and the demand for robust 
performance under packet loss situations. Various recent 
speech coding standardization efforts are another direct 
driving force propelling research and development of low- 
rate speech coding algorithms. A low-rate speech coder 
creates more channels, or users, per allowable application 
bandwidth, and a low-rate speech coder coupled with an 
additional layer of suitable channel coding can fit the overall 
bit-budget of coder specifications and deliver a robust per- 
formance under channel error conditions. 

One effective technique to encode speech efficiently at 
low bit rates is multimode coding. An exemplary multimode 
coding technique is described in U.S. application Ser. No. 
09/217,341, entitled VARIABLE RATE SPEECH 
CODING, filed Dec. 21, 1998, assigned to the assignee of 
the present invention, and fully incorporated herein by 
reference. Conventional multimode coders apply different 
modes, or encoding-decoding algorithms, to different types 
of input speech frames. Each mode, or encoding-decoding 
process, is customized to optimally represent a certain type 
of speech segment, such as, e.g., voiced speech, unvoiced 
speech, transition speech (e.g., between voiced and 
unvoiced), and background noise (silence, or nonspeech) in 
the most efficient manner. An external, open-loop mode 
decision mechanism examines the input speech frame and 
makes a decision regarding which mode to apply to the 
frame. The open-loop mode decision is typically performed 
by extracting a number of parameters from the input frame, 
evaluating the parameters as to certain temporal and spectral 
characteristics, and basing a mode decision upon the evalu- 
ation. 

Coding systems that operate at rates on the order of 2.4 
kbps are generally parametric in nature. That is, such coding 
systems operate by transmitting parameters describing the 
pitch-period and the spectral envelope (or formants) of the 
speech signal at regular intervals. Illustrative of these 
so-called parametric coders is the LP vocoder system. 

LP vocoders model a voiced speech signal with a single 
pulse per pitch period. This basic technique may be aug- 
mented to include transmission information about the spec- 
tral envelope, among other things. Although LP vocoders 
provide reasonable performance generally, they may intro- 
duce perceptually significant distortion, typically character- 
ized as buzz. 
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In recent years, coders have emerged that are hybrids of 
both waveform coders and parametric coders. Illustrative of 
these so-called hybrid coders is the prototype -waveform 
interpolation (PWI) speech coding system. The PWI coding 

5 system may also be known as a prototype pitch period (PPP) 
speech coder. A PWI coding system provides an efficient 
method for coding voiced speech. The basic concept of PWI 
is to extract a representative pitch cycle (the prototype 
waveform) at fixed intervals, to transmit its description, and 

1Q to reconstruct the speech signal by interpolating between the 
prototype waveforms. The PWI method may operate either 
on the LP residual signal or on the speech signal. An 
exemplary PWI, or PPP, speech coder is described in U.S. 
application Ser. No. 09/217,494, entitled PERIODIC 
SPEECH CODING, filed Dec. 21, 1998, now U.S. Pat. No. 
6,456,964 issued Oct. 24, 2002, assigned to the assignee of 
the present invention, and fully incorporated herein by 
reference. Other PWI, or PPP, speech coders are described 
in U.S. Pat. No. 5,884,253 and W. Bastiaan Kleijn & 

2Q Wolfgang Granzow Methods for Waveform Interpolation in 
Speech Coding, in 1 Digital Signal Processing 215-230 
(1991). 

In most conventional speech coders, the parameters of a 
given pitch prototype, or of a given frame, are each indi- 

25 vidually quantized and transmitted by the encoder. In 
addition, a difference value is transmitted for each param- 
eter. The difference value specifies the difference between 
the parameter value for the current frame or prototype and 
the parameter value for the previous frame or prototype. 

30 However, quantizing the parameter values and the difference 
values requires using bits (and hence bandwidth). In a 
low-bit-rate speech coder, it is advantageous to transmit the 
least number of bits possible to maintain satisfactory voice 
quality. For this reason, in conventional low-bit-rate speech 

35 coders, only the absolute parameter values are quantized and 
transmitted. It would be desirable to decrease the number of 
bits transmitted without decreasing the informational value. 
Accordingly, a quantization scheme that quantizes the dif- 
ference between a weighted sum of the parameter values for 

40 previous frames and the parameter value for the current 
frame is described in a related U.S. application Ser. No. 
09/557,282, filed Apr. 24, 2000, entitled "METHOD AND 
APPARATUS FOR PREDICTIVELY QUANTIZING 
VOICED SPEECH," assigned to the assignee of the present 

45 invention, and fully incorporated herein by reference. 

Speech coders experience frame erasure, or packet loss, 
due to poor channel conditions. One solution used in con- 
ventional speech coders was to have the decoder simply 
repeat the previous frame in the event a frame erasure was 

50 received. An improvement is found in the use of an adaptive 
codebook, which dynamically adjusts the frame immedi- 
ately following a frame erasure. A further refinement, the 
enhanced variable rate coder (EVRC), is standardized in the 
Telecommunication Industry Association Interim Standard 

55 EIA/TLA IS-127. The EVRC coder relies upon a correctly 
received, low-predictively encoded frame to alter in the 
coder memory the frame that was not received, and thereby 
improve the quality of the correctly received frame. 
A problem with the EVRC coder, however, is that dis- 

60 continuities between a frame erasure and a subsequent 
adjusted good frame may arise. For example, pitch pulses 
may be placed too close, or too far apart, as compared to 
their relative locations in the event no frame erasure had 
occurred. Such discontinuities may cause an audible click. 

65 In general, speech coders involving low predictability 
(such as those described in the paragraph above) perform 
better under frame erasure conditions. However, as 
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discussed, such speech coders require relatively higher bit In another aspect of the invention, an infrastructure ele- 

rates. Conversely, a highly predictive speech coder can ment configured to compensate for a frame erasure is 

achieve a good quality of synthesized speech output provided. The infrastructure element advantageously 

(particularly for highly periodic speech such as voiced includes a processor; and a storage medium coupled to the 

speech), but performs worse under frame erasure conditions. 5 processor and containing a set of instructions executable by 

It would be desirable to combine the qualities of both types me processor to quantize a pitch lag value and a delta value 

of speech coder. It would further be advantageous to provide for a currerj t frame processed after an erased frame is 

a method of smoothing discontinuities between frame era- declared, the delta value being equal to the difference 

sures and subsequent altered good frames. Thus there is a belween me itch { value for thc frame and a pilch 

need for a frame erasure compensation method that predic- 3Q { yalue for a frame immediately pre ceding the current 

tive coder performance m the event of^ ^^^d ^ ^ d ^ yalue f &{ ^ Qne frame iof tQ 

smoothes discontinuities between frame erasures and sub- . * a a *u e ^ n ^ v,™-„ *u„ 

se uent ood frames current frame and after the frame erasure, wherein the 

sequen g . va j ue - s e q Ua ] t0 tne difference between a pitch lag 

SUMMARY OF THE INVENTION value for the at least one frame and a pitch lag value for a 

The present invention is directed to a frame erasure is frame immediately preceding the at least one frame, and 

compensation method that improves predictive coder per- subtract each delta value from the pitch lag value for the 

formance in the event of frame erasures and smoothes current frame to generate a pitch lag value for the erased 

discontinuities between frame erasures and subsequent good frame, 
frames. Accordingly, in one aspect of the invention, a 

method of compensating for a frame erasure in a speech 20 BRIEF DESCRIPTION OF THE DRAWINGS 

coder is provided. The method advantageously includes c - i *iu 

quantizing a pitch lag value and a delta value for a current 1 15 a block dia & ram of a wireless tele P hone SVStem ' 

frame processed after an erased frame is declared, the delta FIG. 2 is a block diagram of a communication channel 

value being equal to the difference between the pitch lag terminated at each end by speech coders, 

value for the current frame and a pitch lag value for a frame 2 s FIG. 3 is a block diagram of a speech encoder, 

immediately preceding the current frame; quantizing a delta piG. 4 is a block diagram of a speech decoder, 

value for at least one frame prior to the current frame and piG g ^ & blQck d i a gram of a speech coder including 

after the frame erasure, wherein the delta value is equal to encoder/transmitter and decoder/receiver portions, 

the difference between a pitch lag value for the at least one u r • i r*, j f 

frame and a pitch lag value for a frame immediately pre- 30 6 * a P*$ h ° f ^ amphmde ™ tUne f ° r 3 

ceding the at least one frame; and subtracting each delta *** crt of V0iced s P eech * 

value from the pitch lag value for the current frame to FIG. 7 illustrates a first frame erasure processing scheme 

generate a pitch lag value for the erased frame. that can be used in the decoder/receiver portion of the speech 

In another aspect of the invention, a speech coder con- coder of FIG. 5, 

figured to compensate for a frame erasure is provided. The 35 FIG. 8 illustrates a second frame erasure processing 

speech coder advantageously includes means for means for scheme tailored to a variable-rate speech coder, which can 

quantizing a pitch lag value and a delta value for a current be used in the decoder/receiver portion of the speech coder 

frame processed after an erased frame is declared, the delta of FIG. 5. 

value being equal to the difference between the pitch lag FIG. 9 plots signal amplitude versus time for various 

value for the current frame and a pitch lag value for a frame 40 linear predictive (LP) residue waveforms to illustrate a 

immediately preceding the current frame; means for quan- frame erasure processing scheme that can be used to smooth 

tizing a delta value for at least one frame prior to the current a transition between a corrupted frame and a good frame, 

frame and after the frame erasure, wherein the delta value is FIG 10 plots signa j amp litude versus time for various LP 

equal to the difference between a pitch lag value for the at res idue waveforms to illustrate the benefits of the frame 

least one frame and a pitch lag value for a frame immedi- 45 erasure processing scheme depicted in FIG. 9. 

ately preceding the at least one frame; and means for u amplitude versus time for various 

subtracting each delta value from the pitch lag value for the waveforms ^ 0 m ^ a itch iod tot or wave . 

current frame to generate a pitch lag value for the erased form GQ ^ ng l chaiquc . 

Tanother aspect of the invention, a subscriber unit 50 ™- 12 » a block dia 8 ram o£ a P rocessor cou P led t0 a 

configured to compensate for a frame erasure is provided. storage medium. 

The subscriber unit advantageously includes a first speech DETAILED DESCRIPTION OF THE 

coder configured to quantize a pitch lag value and a delta PREFERRED EMBODIMENTS 
value for a current frame processed after an erased frame is 

declared, the delta value being equal to the difference 55 The exemplary embodiments described hereinbelow 
between the pitch lag value for the current frame and a pitch reside in a wireless telephony communication system con- 
lag value for a frame immediately preceding the current figured to employ a CDMA over-the-air interface, 
frame; a second speech coder configured to quantize a delta Nevertheless, it would be understood by those skilled in the 
value for at least one frame prior to the current frame and art that a method and apparatus for predictively coding 
after the frame erasure, wherein the delta value is equal to 60 voiced speech embodying features of the instant invention 
the difference between a pitch lag value for the at least one may reside in any of various communication systems 
frame and a pitch lag value for a frame immediately pre- employing a wide range of technologies known to those of 
ceding the at least one frame; and a control processor skill in the art. 

coupled to the first and second speech coders and configured As illustrated in FIG. 1, a CDMA wireless telephone 
to subtract each delta value from the pilch lag value for the 65 system generally includes a plurality of mobile subscriber 
current frame to generate a pitch lag value for the erased units 10, a plurality of base stations 12, base station con- 
frame, trailers (BSCs) 14, and a mobile switching center (MSC) 16. 
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The MSC 16 is configured to interface with a conventional relatively less speech information. As understood by those 

public switch telephone network (PSTN) 18. The MSC 16 is skilled in the art, other sampling rates and/or frame sizes 

also configured to interface with the BSCs 14. The BSCs 14 may be used. Also in the embodiments described below, the 

are coupled to the base stations 12 via backhaul lines. The speech encoding (or coding) mode may be varied on a 

backhaul lines may be configured to support any of several 5 frame-by-frame basis in response to the speech information 

known interfaces including, e.g., El/Tl, ATM, IP, PPP, or energy of the frame. 

Frame Relay, HDSL, ADSL, or xDSL. It is understood that The first encoder 100 and the second decoder 110 together 

there may be more than two BSCs 14 in the system. Each comprise a first speech coder (encoder/decoder), or speech 

base station 12 advantageously includes at least one sector codec. The speech coder could be used in any communica- 

(not shown), each sector comprising an omnidirectional 10 tion device for transmitting speech signals, including, e.g., 

antenna or an antenna pointed in a particular direction the subscriber units, BTSs, or BSCs described above with 

radially away from the base station 12. Alternatively, each reference to FIG. 1. Similarly, the second encoder 106 and 

sector may comprise two antennas for diversity reception. the first decoder 104 together comprise a second speech 

Each base station 12 may advantageously be designed to coder. It is understood by those of skill in the art that speech 

support a plurality of frequency assignments. The intersec- 15 coders may be implemented with a digital signal processor 

tion of a sector and a frequency assignment may be referred (DSP), an application-specific integrated circuit (ASIC), 

to as a CDMA channel. The base stations 12 may also be discrete gate logic, firmware, or any conventional program - 

known as base station transceiver subsystems (BTSs) 12. mable software module and a microprocessor. The software 

Alternatively, "base station" may be used in the industry to module could reside in RAM memory, flash memory, 

refer collectively to a BSC 14 and one or more BTSs 12, The 2 q registers, or any other form of storage medium known in the 

BTSs 12 may also be denoted "cell sites" 12. Alternatively, art. Alternatively, any conventional processor, controller, or 

individual sectors of a given BTS 12 may be referred to as state machine could be substituted for the microprocessor, 

cell sites. The mobile subscriber units 10 are typically Exemplary ASICs designed specifically for speech coding 

cellular or PCS telephones 10. The system is advantageously are described in U.S. Pat. No. 5,727,123, assigned to the 

configured for use in accordance with the IS-95 standard. 2 5 assignee of the present invention and fully incorporated 

During typical operation of the cellular telephone system, herein by reference, and U.S. application Ser. No. 08/197, 

the base stations 12 receive sets of reverse link signals from 417, entitled VOCODER ASIC, filed Feb. 16, 1994, now 

sets of mobile units 10, The mobile units 10 are conducting U.S. Pat. No. 5,784,532 issued Jul. 21, 1998, assigned to the 

telephone calls or other communications. Each reverse link assignee of the present invention, and fully incorporated 

signal received by a given base station 12 is processed 30 herein by reference. 

within that base station 12. The resulting data is forwarded In FIG. 3 an encoder 200 that may be used in a speech 

to the BSC 14. The BSC 14 provides call resource allocation coder includes a mode decision module 202, a pitch esti- 

and mobility management functionality including the mation module 204, an LP analysis module 206, an LP 

orchestration of soft handoffs between base stations 12. The analysis filter 208, an LP quantization module 210, and a 

BSC 14 also routes the received data to the MSC 16, which 35 residue quantization module 212. Input speech frames s(n) 

provides additional routing services for interface with the are provided to the mode decision module 202, the pitch 

PSTN 18. Similarly, the PSTN 18 interfaces with the MSC estimation module 204, the LP analysis module 206, and the 

16, and the MSC 16 interfaces with the BSC 14, which in LP analysis filter 208. The mode decision module 202 

turn control the base stations 12 to transmit sets of forward produces a mode index 1^ and a mode M based upon the 

link signals to sets of mobile units 10. It should be under- 40 periodicity, energy, signal-to-noise ratio (SNR), or zero 

stood by those of skill that the subscriber units 10 may be crossing rate, among other features, of each input speech 

fixed units in alternate embodiments. frame s(n). Various methods of classifying speech frames 

In FIG. 2 a first encoder 100 receives digitized speech according to periodicity are described in U.S. Pat, No. 

samples s(n) and encodes the samples s(n) for transmission 5,911,128, which is assigned to the assignee of the present 

on a transmission medium 102, or communication channel 45 invention and fully incorporated herein by reference. Such 

102, to a first decoder 104. The decoder 104 decodes the methods are also incorporated into the Telecommunication 

encoded speech samples and synthesizes an output speech Industry Association Interim Standards TIA/EI A IS-127 and 

signal S 5>W77/ (n). For transmission in the opposite direction, TIA/EIA IS-733. An exemplary mode decision scheme is 

a second encoder 106 encodes digitized speech samples s(n), also described in the aforementioned U.S. application Ser. 

which are transmitted on a communication channel 108. A 50 No. 09/217,341. 

second decoder 110 receives and decodes the encoded The pitch estimation module 204 produces a pitch index 

speech samples, generating a synthesized output speech \ p and a lag value P 0 based upon each input speech frame 

signal SsyHTf^n). s(n). The LP analysis module 206 performs linear predictive 

The speech samples s(n) represent speech signals that analysis on each input speech frame s(n) to generate an LP 

have been digitized and quantized in accord a nee with any of 55 parameter a. The LP parameter a is provided to the LP 

various methods known in the art including, e.g., pulse code quantization module 210. The LP quantization module 210 

modulation (PCM), companded /i-law, or A-law. As known also receives the mode M, thereby performing the quanti- 

in the art, the speech samples s(n) are organized into frames zation process in a mode -dependent manner. The LP quan- 

of input data wherein each frame comprises a predetermined tization module 210 produces an LP index \ LP and a quan- 

number of digitized speech samples s(n). In an exemplary 60 tized LP parameter a. The LP analysis filter 208 receives the 

embodiment, a sampling rate of 8 kHz is employed, with quantized LP parameter a in addition to the input speech 

each 20 ms frame comprising 160 samples. In the embodi- frame s(n). The LP analysis filter 208 generates an LP 

ments described below, the rate of data transmission may residue signal R[n], which represents the error between the 

advantageously be varied on a frame-by-frame basis from input speech frames s(n) and the reconstructed speech based 

full rate to half rate to quarter rate to eighth rate. Varying the 65 on the quantized linear predicted parameters a\ The LP 

data transmission rate is advantageous because lower bit residue R[n], the mode M, and the quantized LP parameter 

rates may be selectively employed for frames containing § are provided to the residue quantization module 212. 
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Based upon these values, the residue quantization module inverse LP filter, A(z), is computed in accordance with the 
212 produces a residue index Ij, and a quantized residue following equation: 



signal ft[n]. 



In FIG. 4 a decoder 300 that may be used in a speech 

coder includes an LP parameter decoding module 302, a 5 in whicD the coefficients bl 1 are filter taps having predefined 

residue decoding module 304, a mode decoding module 306, values chosen in accordance with known methods, as 

and an LP synthesis filter 308. The mode decoding module described in the aforementioned U.S. Pat. Nos. 5,414,796 

306 receives and decodes a mode index \ M , generating and 6,456,964. The number p indicates the number of 

therefrom a mode M. The LP parameter decoding module previous samples the inverse LP filter uses for prediction 

302 receives the mode M and an LP index \ LP . The LP 30 purposes. In a particular embodiment, p is set to ten. 

parameter decoding module 302 decodes the received values The parameter calculator 406 derives various parameters 

to produce a quantized LP parameter a\ The residue decod- based on the current frame. In one embodiment these 

ing module 304 receives a residue index I*, a pitch index Ip, parameters include at least one of the following: linear 

and the mode index l M . The residue decoding module 304 predictive coding (LPC) filter coefficients, line spectral pair 

decodes the received values to generate a quantized residue 15 (LS p ) coefficients, normalized autocorrelation functions 

signal ft[n]. The quantized residue signal ft[n] and the (NACFs), open-loop lag, zero crossing rates, band energies, 

quantized LP parameter a are provided to the LP synthesis and the formant residual signal. Computation of LPC 

filter 308, which synthesizes a decoded output speech signal coefficients, LSP coefficients, open-loop lag, band energies, 

s[n] therefrom. aQ d the formant residual signal is described in detail in the 

Operation and implementation of the various modules of 20 aforementioned U.S. Pat. No. 5,414,796 Computation of 

the encoder 200 of FIG. 3 and the decoder 300 of FIG. 4 are * ACFs and zero , crossing rates ; is described in detail in the 

known in the art and described in the aforementioned U.S. aforementioned U.S. Pat. No 5 911,128. 

Pat. No. 5,414,796 and L. B. Rabiner & R. W. Schafer, , ^ Parameter calculator 406 is coupled to the mode 

Digital Processing of Speech Signals 396^53 (1978). classification module 408. The parameter calculator 406 

r , ,. .„ , . ,25 provides the parameters to the mode classification module 

In one embodiment, illustrated in FIG. 5 a multimode m ^ mode classification module 408 is C0U pled to 

speech encoder 400 communicates with a multimode speech dynamically switch ^tween the encoding modes 410 on a 

decoder 402 across a communication channel, or transmis- frame . by _ frame basis in order , 0 the most appropriate 

sion medium 404. The communication channel 404 is encodi mode 41Q for ^ currem frame ^ mode das . 

advantageously an RF interface configured in accordance 3Q sification module m a particular encoding mode 

with the IS-95 standard. It would I be understood by those of 410 for , he curren , fraffle b ari me parameters with 

skill in the art that the encoder 400 has an associated decoder defined threshold ^ vahles . Based on , ne 

(not shown). The encoder 400 and its associated decoder of , he frame> , he mode classification module 

together form a first speech coder. It would also be under- m dassifles , he frame as no h> of i[)active h 

stood by those of skill in the art that the decoder 402 has an J5 ( ^ background noise> or pauses between words), 

associated encoder (not shown). The decoder 402 and its ech Based the periodicity of the frame) the mode 

associated encoder together form a second speech coder. The classification module 408 then c i assifies speech frames as a 

nrst and second speech coders may advantageously be parlicular type of speech( e „ voiced> unvoi ced, or tran- 

lmplemented as part of first and second DSPs, and may s - ent 

reside in, e.g a subscriber unit and a base station in a PCS 4Q Voiced h fa h that exhibits a relativel hi h 

or cellular telephone system, or in a subscriber unit and a degree of periodicitVj A segment of voiced speech is shown 

gateway in a satellite system. ^ lhe graph of FIG 6 M mustrated> the pitch period ^ a 

The encoder 400 includes a parameter calculator 406, a component of a speech frame that may be used to advantage 

mode classification module 408, a plurality of encoding t0 analyze and reconstruct the contents of the frame, 

modes 410, and a packet formatting module 412. The 45 Unvoiced speech typically comprises consonant sounds, 

number of encoding modes 410 is shown as n, which one of Transient speech frames are typically transitions between 

skill would understand could signify any reasonable number voiced and unvoiced speech. Frames that are classified as 

of encoding modes 410. For simplicity, only three encoding neither voiced nor unvoiced speech are classified as transient 

modes 410 are shown, with a dotted line indicating the speech. It would be understood by those skilled in the art that 

existence of other encoding modes 410. The decoder 402 50 any reasonable classification scheme could be employed, 

includes a packet disassembler and packet loss detector Classifying the speech frames is advantageous because 

module 414, a plurality of decoding modes 416, an erasure different encoding modes 410 can be used to encode differ- 

decoder 418, and a post filter, or speech synthesizer, 420. ent types D f sp eech, resulting in more efficient use of 

The number of decoding modes 416 is shown as n, which bandwidth in a shared channel such as the communication 

one of skill would understand could signify any reasonable 55 channel 404, For example, as voiced speech is periodic and 

number of decoding modes 416. For simplicity, only three thus highly predictive, a low-bit-rate, highly predictive 

decoding modes 416 are shown, with a dotted line indicating encoding mode 410 can be employed to encode voiced 

the existence of other decoding modes 416. speech. Classification modules such as the classification 

A speech signal, s(n), is provided to the parameter cal- module 408 are described in detail in the aforementioned 

culator 406. The speech signal is divided into blocks of 60 U.S. application Ser. No. 09/217,341 and in U.S. application 

samples called frames. The value n designates the frame Ser. No. 09/259,151 entitled CLOSED-LOOP MULTI- 

number. In an alternate embodiment, a linear prediction (LP) MODE MIXED-DOMAIN LINEAR PREDICTION 

residual error signal is used in place of the speech signal. (MDLP) SPEECH CODER, filed Feb. 26, 1999, assigned to 

The LP residue is used by speech coders such as, e.g., the the assignee of the present invention, and fully incorporated 

CELP coder. Computation of the LP residue is advanta- 65 herein by reference. 

geously performed by providing the speech signal to an The mode classification module 408 selects an encoding 

inverse LP filter (not shown). The transfer function of the mode 410 for the current frame based upon the classification 
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of the frame. The various encoding modes 410 are coupled 
in parallel One or more of the encoding modes 410 may be 
operational at any given time. Nevertheless, only one encod- 
ing mode 410 advantageously operates at any given time, 
and is selected according to the classification of the current 
frame. 

The different encoding modes 410 advantageously oper- 
ate according to different coding bit rates, different coding 
schemes, or different combinations of coding bit rate and 
coding scheme. The various coding rates used may be full 
rate, half rate, quarter rate, and/or eighth rate. The various 
coding schemes used may be CELP coding, prototype pitch 
period (PPP) coding (or waveform interpolation (Wl) 
coding), and/or noise excited linear prediction (NELP) cod- 
ing. Thus, for example, a particular encoding mode 410 
could be full rate CELP, another encoding mode 410 could 
be half rate CELP, another encoding mode 410 could be 
quarter rate PPP, and another encoding mode 410 could be 
NELP. 

In accordance with a CELP encoding mode 410, a linear 
predictive vocal tract model is excited with a quantized 
version of the LP residual signal, The quantized parameters 
for the entire previous frame are used to reconstruct the 
current frame. The CELP encoding mode 410 thus provides 
for relatively accurate reproduction of speech but at the cost 
of a relatively high coding bit rate. The CELP encoding 
mode 410 may advantageously be used to encode frames 
classified as transient speech. An exemplary variable rate 
CELP speech coder is described in detail in the aforemen- 
tioned U.S. Pat. No. 5,414,796. 

In accordance with a NELP encoding mode 410, a filtered, 
pseudo-random noise signal is used to model the speech 
frame. The NELP encoding mode 410 is a relatively simple 
technique that achieves a low bit rate. The NELP encoding 
mode 410 may be used to advantage to encode frames 
classified as unvoiced speech. An exemplary NELP encod- 
ing mode is described in detail in the aforementioned U.S. 
Pat. No. 6,456,964. 

In accordance with a PPP encoding mode 410, only a 
subset of the pitch periods within each frame are encoded. 
The remaining periods of the speech signal are reconstructed 
by interpolating between these prototype periods. In a time- 
domain implementation of PPP coding, a first set of param- 
eters is calculated that describes how to modify a previous 
prototype period to approximate the current prototype 
period. One or more codevectors are selected which, when 
summed, approximate the difference between the current 
prototype period and the modified previous prototype 
period. A second set of parameters describes these selected 
codevectors. In a frequency-domain implementation of PPP 
coding, a set of parameters is calculated to describe ampli- 
tude and phase spectra of the prototype. This may be done 
either in an absolute sense or predictively. A method for 
predictively quantizing the amplitude and phase spectra of a 
prototype (or of an entire frame) is described in the afore- 
mentioned related U.S. application Ser. No. 09/557,282, 
filed Apr. 24, 2000, and entitled "METHOD AND APPA- 
RATUS FOR PREDICTIVELY QUANTIZING VOICED 
SPEECH." In accordance with either implementation of PPP 
coding, the decoder synthesizes an output speech signal by 
reconstructing a current prototype based upon the first and 
second sets of parameters. The speech signal is then inter- 
polated over the region between the current reconstructed 
prototype period and a previous reconstructed prototype 
period. The prototype is thus a portion of the current frame 
that will be linearly interpolated with prototypes from pre- 
vious frames that were similarly positioned within the frame 
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in order to reconstruct the speech signal or the LP residual 
signal at the decoder (i.e., a past prototype period is used as 
a predictor of the current prototype period). An exemplary 
PPP speech coder is described in detail in the aforemen- 

5 tioned U.S. Pat. No. 6,456,964. 

Coding the prototype period rather than the entire speech 
frame reduces the required coding bit rate. Frames classified 
as voiced speech may advantageously be coded with a PPP 
encoding mode 410. As illustrated in FIG. 6, voiced speech 

10 contains slowly time -varying, periodic components that are 
exploited to advantage by the PPP encoding mode 410. By 
exploiting the periodicity of the voiced speech, the PPP 
encoding mode 410 is able to achieve a lower bit rate than 
the CELP encoding mode 410. 

15 The selected encoding mode 410 is coupled to the packet 
formatting module 412. The selected encoding mode 410 
encodes, or quantizes, the current frame and provides the 
quantized frame parameters to the packet formatting module 
412. The packet formatting module 412 advantageously 

20 assembles the quantized information into packets for trans- 
mission over the communication channel 404. In one 
embodiment the packet formatting module 412 is configured 
to provide error correction coding and format the packet in 
accordance with the IS-95 standard. The packet is provided 

25 to a transmitter (not shown), converted to analog format, 
modulated, and transmitted over the communication channel 
404 to a receiver (also not shown), which receives, 
demodulates, and digitizes the packet, and provides the 
packet to the decoder 402. 

30 In the decoder 402, the packet disassembler and packet 
loss detector module 414 receives the packet from the 
receiver. The packet disassembler and packet loss detector 
module 414 is coupled to dynamically switch between the 
decoding modes 416 on a packet-by-packet basis. The 

35 number of decoding modes 416 is the same as the number 
of encoding modes 410, and as one skilled in the art would 
recognize, each numbered encoding mode 410 is associated 
with a respective similarly numbered decoding mode 416 
configured to employ the same coding bit rate and coding 

40 scheme. 

If the packet disassembler and packet loss detector mod- 
ule 414 detects the packet, the packet is disassembled and 
provided to the pertinent decoding mode 416. If the packet 
disassembler and packet loss detector module 4l Tdoes not 
45 detect a packet, a packet iossls" declared and theTrasu re 
deco"cIeF418 advantageously performs frame era sure pro- 

re^^n^jT^ flftsC"kerl in liefail he low ~ 

"Theparallel array of decoding modes 416 and the erasure 
decoder 418 are coupled to the post filter 420. The pertinent 

50 decoding mode 416 decodes, or de-quantizes, the packet 
provides the information to the post filter 420. The post filter 
420 reconstructs, or synthesizes, the speech frame, output- 
ting synthesized speech frames, s(n). Exemplary decoding 
modes and post filters are described in detail in Jheafore- 

55 mentioned U.S. Pat. Nos. 5,414,796 and 6^456^>6C 

In one embodimenKhe^ujm^^d-parairjelers themselves 
are not transmitted. Instead, codebook, indices specifying 
addresses in various lookup tables (LUTs) (not shown) in the 
decoder 402 are transmitted. The decoder 402 receives the 

60 codebook indices and searches the various codebook LUTs 
for appropriate parameter values. Accordingly, codebook 
indices for parameters such as, e.g., pitch lag, adaptive 
codebook gain, and LSP may be transmitted, and three 
associated codebook LUTs are searched by the decoder 402. 

65 In accordance with the CELP encoding mode 410, pitch 
lag, amplitude, phase, and LSP parameters are transmitted. 
The LSP codebook indices are transmitted because the LP 



02/26/2004, EAST Version: 1.4.1 



US 6,5i 

13 

residue signal is to be synthesized at the decoder 402. 
Additionally, the difference between the pitch lag value for 
the current frame and the pitch lag value for the previous 
frame is transmitted. 

In accordance with a conventional PPP encoding mode in 
which the speech signal is to be synthesized at the decoder, 
only the pitch lag, amplitude, and phase parameters are 
transmitted. The lower bit rate employed by conventional 
PPP speech coding techniques does not permit transmission 
of both absolute pitch lag information and relative pitch lag 
difference values. 

In accordance with one embodiment, highly periodic 
frames such as voiced speech frames are transmitted with a 
low-bit-rate PPP encoding mode 410 that quantizes the 
difference between the pitch lag value for the current frame 
and the pitch lag value for the previous frame for 
transmission, and does not quantize the pitch lag value for 
the current frame for transmission. Because voiced frames 
are highly periodic in nature, transmitting the difference 
value as opposed to the absolute pitch lag value allows a 
lower coding bit rate to be achieved. In one embodiment this 
quantization is generalized such that a weighted sum of the 
parameter values for previous frames is computed, wherein 
the sum of the weights is one, and the weighted sum is 
subtracted from the parameter value for the current frame. 
The difference is then quantized. This technique is described 
in detail in the aforementioned related U.S. application Ser. 
No. 09/557,282, filed Apr. 24, 2000, and entitled "METHOD 
AND APPARATUS FOR PREDICT! VELY QUANTIZING 
VOICED SPEECH." 

In accordance with one embodiment, a variable-rate cod- 
ing system encodes different types of speech as determined 
by a control processor with different encoders, or encoding 
modes, controlled by the processor, or mode classifier. The 
encoders modify the current frame residual signal (or in the 
alternative, the speech signal) according to a pitch contour 
as specified by pitch lag value for the previous frame, L_ l7 
and the pitch lag value for the current frame, L. A control 
processor for the decoders follows the same pitch contour to 
reconstruct an adaptive codebook contribution, {P(n)}, from 
a pitch memory for the quantized residual or speech for the 
current frame. 

If the previous pitch lag value, L_ u is lost, the decoders 
cannot reconstruct the correct pitch contour. This causes the 
adaptive codebook contribution, {P(n)}, to be distorted. In 
turn, the synthesized speech will suffer severe degradation 
even though a packet is not lost for the current frame. As a 
remedy, some conventional coders employ a scheme to 
encode both L and the difference between L and L_ r This 
difference, or delta pitch value may be denoted by A, where 
A=L-L_j, serves the purpose of recovering L_ n if L_ a is lost 
in the previous frame. 

The presently described embodiment may be used to best 
advantage in a variable-rate coding system. Specifically, a 
first encoder (or encoding mode), denoted by C, encodes the 
current frame pitch lag value, L, and the delta pitch lag 
value, A, as described above. A second encoder (or encoding 
mode), denoted by Q, encodes the delta pitch lag value, A, 
but does not necessarily encode the pitch lag value, L. This 
allows the second coder, Q, to use the additional bits to 
encode other parameters or to save the bits altogether (i.e., 
to function as a low-bit-rate coder). The first coder, C, may 
advantageously be a coder used to encode relatively non- 
periodic speech such as, e.g., a full rate CELP coder. The 
second coder, Q, may advantageously be a coder used to 
encode highly periodic speech (e.g., voiced speech) such as, 
e.g., a quarter rate PPP coder. 
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As illustrated in the example of FIG. 7, if the packet of the 
previous frame, frame n-1, is lost, the pitch memory 
contribution, {P_ 2 (n)}, after decoding the frame received 
prior to the previous frame, frame n-2, is stored in the coder 

5 memory (not shown). The pitch lag value for frame n-2, 
L_ 2 , is also stored in the coder memory. If the current frame, 
frame n, is encoded by coder C, frame n may be called a C 
frame. Coder C can restore the previous pitch lag value, L_ lf 
from the delta pitch value, A, using the equation L^L-A. 

30 Hence, a correct pitch contour can be reconstructed with the 
values L_ a and L_ 2 . The adaptive codebook contribution for 
frame n-1 can be repaired given the right pitch contour, and 
is subsequently used to generate the adaptive codebook 
contribution for frame n. Those skilled in the art understand 

is that such a scheme is used in some conventional coders such 
as the EVRC coder. 

In accordance with one embodiment, frame erasure per- 
formance in a variable-rate speech coding system using the 
above-described two types of coders (coder C and coder Q) 

20 is enhanced as described below. As illustrated in the example 
of FIG. 8, a variable-rate coding system may be designed to 
use both coder C and coder Q. The current frame, frame n, 
is a C frame and its packet is not lost. The previous frame, 
frame n-1, is a Q frame. The packet for the frame preceding 

25 the Q frame (i.e., the packet for frame n-2) was lost. 

In frame erasure processing for frame n-2, the pitch 
memory contribution, {P_ 3 (n)}, after decoding frame n-3 is 
stored in the coder memory (not shown). The pitch lag value 
for frame n-3, L_ 3 , is also stored in the coder memory. The 

30 pitch lag value for frame n-1, L_ ly can be recovered by using 
the delta pitch lag value, A (which is equal to L-L_i), in the 
C frame packet according to the equation L_ 1 =L-A. Frame 
n-1 is a Q frame with an associated encoded delta pitch lag 
value of its own, A_ a , equal to L_!-L_ 2 . Hence, the pitch lag 

35 value for the erasure frame, frame n-2, L_ 2 , can be recov- 
ered according to the equation L.^L^-A.!. With the 
correct pitches lag values for frame n-2 and frame n-1, pitch 
contours for these frames can advantageously be recon- 
structed and the adaptive codebook contribution can be 

40 repaired accordingly. Hence, the C frame will have the 
improved pitch memory required to compute the adaptive 
codebook contribution for its quantized LP residual signal 
(or speech signal). This method can be readily extended to 
allow for the existence of multiple Q frames between the 

45 erasure frame and the C frame as can be appreciated by those 
skilled in the art. 

As shown graphically in FIG. 9, when a frame is erased, 
the erasure decoder (e.g., element 418 of FIG. 5) recon- 
structs the quantized LP residual (or speech signal) without 

50 the exact information of the frame. If the jntch conto ur and 
the pitch memory of the erased frame were restored in 
accordance vyitr T~the above-described method for recon- 
structmg'trle" q uantized LP residual (or speech signal) ol t he 
current'fr'anTe, the res ultant quantized LP residual (or speec h 

55 signal)"would"bnifTerent than that had the corrupted pitch 
memory been used. Such a change in thp rwW pi tnh 
memory will result in a discontinuity in quantized residuals 
(or speech signals) across frames. Hence, ^transition sound, 
or click, is often heard in conventional speecTTcoTleTs' s uch 

60 as the EVRC c oder. 

tn accordance with one embodiment, pitch period proto- 
types are extracted from the corrupted pitch memory prior to 
repair. The LP residual (or speech signal) for the current 
frame is also extracted in accordance with a normal dequan- 

65 tization process. The quantized LP residual (or speech 
signal) for the current frame is then reconstructed in accor- 
dance with a waveform interpolation (WI) method. In a 
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particular embodiment, the WI method operates according 
to the PPP encoding mode described above. This method 
advantageously serves to smooth the discontinuity described 
above and to further enhance the frame erasure performance 
of the speech coder. Such a WI scheme can be used 
whenever the pitch memory is repaired due to erasure 
processing regardless of the techniques used to accomplish 
the repair (including, but not limited to, e.g., the techniques 
described in the previously hereinabove). 

The graphs of FIG. 10 illustrate the difference in appear- 
ance between an LP residual signal having been adjusted in 
accordance with conventional techniques, producing an 
audible click, and an LP residual signal having been subse- 
quently smoothed in accordance with the above-described 
WI smoothing scheme, The graphs of FIG. 11 illustrate 
principles of a PPP or WI coding technique. 

Thus, a novel and improved frame erasure compensation 
method in a variable-rate speech coder has been described. 
Those of skill in the art would understand that the data, 
instructions, commands, information, signals, bits, symbols, 
and chips that may be referenced throughout the above 
description are advantageously represented by voltages, 
currents, electromagnetic waves, magnetic fields or 
particles, optical fields or particles, or any combination 
thereof. Those of skill would further appreciate that the 
various illustrative logical blocks, modules, circuits, and 
algorithm steps described in connection with the embodi- 
ments disclosed herein may be implemented as electronic 
hardware, computer software, or combinations of both. The 
various illustrative components, blocks, modules, circuits, 
and steps have been described generally in terms of their 
functionality. Whether the functionality is implemented as 
hardware or software depends upon the particular applica- 
tion and design constraints imposed on the overall system. 
Skilled artisans recognize the interchange ability of hardware 
and software under these circumstances, and how best to 
implement the described functionality for each particular 
application. As examples, the various illustrative logical 
blocks, modules, circuits, and algorithm steps described in 
connection with the embodiments disclosed herein may be 
implemented or performed with a digital signal processor 
(DSP), an application specific integrated circuit (ASIC), a 
field programmable gate array (FPGA) or other program- 
mable logic device, discrete gate or transistor logic, discrete 
hardware components such as, e.g., registers and FIFO, a 
processor executing a set of firmware instructions, any 
conventional programmable software module and a 
processor, or any combination thereof designed to perform 
the functions described herein. The processor may advan- 
tageously be a microprocessor, but in the alternative, the 
processor may be any conventional processor, controller, 
microcontroller, or state machine. The software module 
could reside in RAM memory, flash memory, ROM memory, 
EPROM memory, EEPROM memory, registers, hard disk, a 
removable disk, a CD-ROM, or any other form of storage 
medium known in the art. As illustrated in FIG. 12, an 
exemplary processor 500 is advantageously coupled to a 
storage medium 502 so as to read information from, and 
write information to, the storage medium 502. In the 
alternative, the storage medium 502 may be integral to the 
processor 500, The processor 500 and the storage medium 
502 may reside in an ASIC (not shown). The ASIC may 
reside in a telephone (not shown). In the alternative, the 
processor 500 and the storage medium 502 may reside in a 
telephone. The processor 500 may be implemented as a 
combination of a DSP and a microprocessor, or as two 
microprocessors in conjunction with a DSP core, etc. 

Preferred embodiments of the present invention have thus 
been shown and described. It would be apparent to one of 
ordinary skill in the art, however, that numerous alterations 
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may be made to the embodiments herein disclosed without 
departing from the spirit or scope of the invention. 
Therefore, the present invention is not to be limited except 
in accordance with the following claims. 
What is claimed is: 

1. A method of compensating for a frame erasure in a 
variable rate speech coder, comprising: 

dequantizing a pitch lag value and a first delta value for 
a current frame processed after an erased frame is 
declared, the first delta value being equal to the differ- 
ence between the pitch lag value for the current frame 
and a pitch lag value for a frame immediately preceding 
the current frame, the current frame encoded according 
to a first encoding mode; 

dequantizing at least one delta value for at least one frame 
prior to the current frame and after the frame erasure, 
wherein the at least one delta value is equal to the 
difference between a pitch lag value for the at least one 
frame and a pitch lag value for a frame immediately 
preceding the at least one frame, the at least one frame 
encoded according to a second encoding mode different 
from the first encoding mode; and 

subtracting each delta value from the pitch lag value for 
the current frame to generate a pitch lag value for the 
erased frame. 

2. The method of claim 1, further comprising reconstruct- 
ing the erased frame to generate a reconstructed frame. 

3. The method of claim 2, further comprising performing 
a waveform interpolation to smooth any discontinuity exist- 
ing between the current frame and the reconstructed frame. 

4. The method of claim 1, wherein dequantizing the pitch 
lag value and a first delta value for a current frame is 
performed in accordance with a relatively nonpredictive 
coding mode. 

5. The method of claim 1, wherein dequantizing at least 
one delta value is performed in accordance with a relatively 
predictive coding mode. 

6. A variable rate speech coder configured to compensate 
for a frame erasure, comprising: 

means for decoding a pitch lag value and a first delta value 
for a current frame processed after an erased frame is 
declared, the first delta value being equal to the differ- 
ence between the pitch lag value for the current frame 
and a pitch lag value for a frame immediately preceding 
the current frame, the current frame being encoded 
according to a first encoding mode; 

means for decoding at least one delta value for at least one 
frame prior to the current frame and after the frame 
erasure, wherein the at least one delta value is equal to 
the difference between a pitch lag value for the at least 
one frame and a pitch lag value for a frame immediately 
preceding the at least one frame, the at least one frame 
encoded according to a second encoding mode different 
from the first encoding mode; and 

means for subtracting each delta value from the pitch lag 
value for the current frame to generate a pitch lag value 
for the erased frame. 

7. The speech coder of claim 6, further comprising means 
for reconstructing the erased frame to generate a recon- 
structed frame. 

8. The speech coder of claim 7, further comprising means 
for performing a waveform interpolation to smooth any 
discontinuity existing between the current frame and the 
reconstructed frame. 

9. The speech coder of claim 6, wherein the means for 
decoding a pitch lag value and a first delta value comprises 
means for dequantizing in accordance with a relatively 
nonpredictive coding mode. 

10. The speech coder of claim 6, wherein the means for 
decoding at least one delta value comprises means for 
dequantizing in accordance with a relatively predictive 
coding mode. 
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11. A subscriber unit configured to.compensate for a frame 
erasure, comprising: 

a first speech coder configured to decode a pitch lag value 
and a first delta value for a current frame processed 
after an erased frame is declared, the first delta value 
being equal to the difference between the pitch lag 
value for the current frame and a pitch lag value for a 
frame immediately preceding the current frame, the 
current frame encoded according to a first encoding 
mode; 

a second speech coder configured to decode at least one 
delta value for at least one frame prior to the current 
frame and after the frame erasure, wherein the at least 
one delta value is equal to the difference between a 
pitch lag value for the at least one frame and a pitch lag 
value for a frame immediately preceding the at least 
one frame, the at least one frame encoded according to 
a second encoding mode different from the first encod- 
ing mode; and 

a control processor coupled to the first and second speech 
coders and configured to subtract each delta value from 
the pitch lag value for the current frame to generate a 
pitch lag value for the erased frame. 

12. The subscriber unit of claim 11, wherein the control 
processor is further configured to reconstruct the erased 
frame to generate a reconstructed frame. 

13. The subscriber unit of claim 12, wherein the control 
processor is further configured to perform a waveform 
interpolation to smooth any discontinuity existing between 
the current frame and the reconstructed frame. 

14. The subscriber unit of claim 11, wherein the first 
speech coder is configured to decode in accordance with a 
relatively nonpredictive coding mode. 

15. The subscriber unit of claim 11, wherein the second 
speech coder is configured to decode in accordance with a 
relatively predictive coding mode. 

16. The subscriber unit as in claim 11, further comprising: 
a switching means coupled to the control processor, and 

adapted to: 

determine an encoding mode of each received frame; 
and 

couple to the corresponding one of the first and second 
speech coders. 

17. The subscriber unit as in claim 16, further comprising: 
frame erasure detection means coupled to the control 

processor. 

18. An infrastructure element configured to compensate 
for a frame erasure, comprising: 

a processor; and 

a storage medium coupled to the processor and containing 
a set of instructions executable by the processor to 
dequantize a pitch lag value and a first delta value for 
a current frame processed after an erased frame is 
declared, the first delta value being equal to the differ- 
ence between the pitch lag value for the current frame 
and a pitch lag value for a frame immediately preceding 
the current frame, dequantize at least one delta value 
for at least one frame prior to the current frame and 
after the frame erasure, wherein the at least one delta 
value is equal to the difference between a pitch lag 
value for the at least one frame and a pitch lag value for 
a frame immediately preceding the at least one frame, 
and subtract each delta value from the pitch lag value 
for the current frame to generate a pitch lag value for 
the erased frame, 

wherein the current frame is encoded according to a first 
encoding mode, and the at least one frame is encoded 
according to a second encoding mode different from the 
first encoding mode. 

19. The infrastructure element of claim 18, wherein the set 
of instructions is further executable by the processor to 
reconstruct the erased frame to generate a reconstructed 
frame. 



20. The infrastructure element of claim 19, wherein the set 
of instructions is further executable by the processor to 
perform a waveform interpolation to smooth any disconti- 
nuity existing between the current frame and the recon- 

5 structed frame. 

21. The infrastructure element of claim 18, wherein the set 
of instructions is further executable by the processor to 
dequantize the pitch lag value and the first delta value for the 
current frame in accordance with a relatively nonpredictive 

io coding mode. 

22. The infrastructure element of claim 18, wherein the set 
of instructions is further executable by the processor to 
dequantize the at least one delta value for at least one frame 
prior to the current frame and after the frame erasure in 

15 accordance with a relatively predictive coding mode. 

23. A method of compensating for a frame erasure in a 
variable rate speech decoder, wherein frames received at the 
speech decoder include a delta value, each delta value 
corresponding to a change in pitch lag from an immediately 

20 preceding frame, the method comprising: 
declaring an erased frame; 

decoding a first delta value for a first frame, the first frame 
being received after the erased frame is declared, 
wherein the first frame is encoded using a first encoding 
25 mode; 

decoding a current pitch lag value and a current delta 
value for a current frame processed after receiving the 
first frame, wherein the current frame is encoded using 
a second encoding mode different from the first encod- 
30 ing mode; 

generating a first pitch lag value for the first frame based 
on the first delta value and the current pitch lag value; 
and 

subtracting the first and current delta values from the 
35 current pitch lag value for the current frame to generate 
a pitch lag value for the erased frame. 

24. The method as in claim 23, wherein the second 
encoding mode is used to encode relatively nonperiodic 
speech. 

4Q 25. The method as in claim 24, wherein the first encoding 
mode is used to encode relatively periodic speech. 

26. The method as in claim 25, wherein the first encoding 
mode provides a first bit rate encoding and the second 
encoding mode provides a second bit rate encoding, wherein 
the first bit rate is less than the second bit rate. 
45 27. An apparatus for compensating for a frame erasure in 
a speech decoder, wherein frames received at the speech 
decoder include a delta value, each delta value correspond- 
- ing to a change in pitch lag from an immediately preceding 

frame, the apparatus comprising: 
50 means for declaring an erased frame; 

means for decoding a first delta value for a first frame, the 
first frame being received after the erased frame is 
declared, wherein the first frame is encoded using a first 
encoding mode; 
55 means for decoding a current pitch lag value and a current 
delta value for a current frame processed after receiving 
the first frame, wherein the current frame is encoded 
using a second encoding mode different from the first 
encoding mode; 
60 means for generating a first pitch lag value for the first 
frame based on the first delta value and the current pitch 
lag value; and 

means for subtracting the first and current delta values 
from the current pitch lag value for the current frame to 
65 generate a pitch lag value for the erased frame. 

***** 
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